Mathematical Programming in Machine Learning and Data Mining
نویسندگان
چکیده
The field of Machine Learning (ML) and Data Mining (DM) is focused around the following problem: Given a data domainD we want to approximate an unknown function y(x) on the given data setX ⊂ D (for which the values of y(x) may or may not be known) by a function f from a given class F so that the approximation generalizes in the best possible way on all of the (unseen) data x ∈ D. The approximating function f might take real values, as in the case of regression; binary values, as in the case of classification; or integer values, as in some cases of ranking; or this function might be a mapping between ordered subsets of data points and ordered subsets of real, integer or binary values, as in the case of structured object prediction. The quality of approximation by f can be measured by various objective functions. For instance in the case of support vector machine (SVM)[4] classification the quality of the approximating function is estimated by a weighted sum of a regularization term h(f) and the hinge loss term ∑ x∈X max{1−y(x)f(x), 0}. Hence, many of the machine learning problems can be posed as an optimization problem where optimization is performed over a given class F for a chosen objective. The connection between optimization and machine learning (although always present) became especially evident with the popularity of the SVMs [4], [24], and the kernel methods in general [18]. SVM classification problem is formulated as a convex quadratic program.
منابع مشابه
ارائه یک مدل بهینهسازی ریاضی چندهدفه برای طبقهبندی
In this paper we investigate the issues of data classification (as one of the branches of data mining science) in form of multi-objective mathematical programming model. The model that we present and investigate is a MODM problem. First time, based on support vector machine (SVM) idea (To maximize the margin of two groups), a multi-criteria mathematical programming model was proposed for data m...
متن کاملDevelopment of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability
Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set. Therefore, developing a machine for p...
متن کاملForecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data
Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کامل